Content On This Page | ||
---|---|---|
Mathematical Expectation (Mean) of a Random Variable $E(X) = \sum x_i P(x_i)$ | Variance of a Random Variable | Mean and Variance of Probability Distribution (Consolidated) |
Measures of Probability Distributions: Expectation and Variance
Mathematical Expectation (Mean) of a Random Variable $E(X) = \sum x_i P(x_i)$
Definition and Concept
The **Mathematical Expectation**, also known as the **Expected Value**, or simply the **mean**, of a random variable is a fundamental measure of central tendency for a probability distribution. It represents the theoretical average value that we would expect to obtain if the random experiment associated with the random variable were repeated an infinite number of times.
For a **discrete random variable** $X$, the expected value is a weighted average of all possible values that $X$ can take, where the weights are the probabilities of $X$ taking on those values.
The expected value of $X$ is denoted by $E(X)$ or the Greek letter $\mu$ (mu).
Formula for a Discrete Random Variable
Let $X$ be a discrete random variable with a probability distribution defined by its possible values $x_1, x_2, x_3, \dots$ and their corresponding probabilities $P(X=x_1) = p_1, P(X=x_2) = p_2, P(X=x_3) = p_3, \dots$. The set of possible values may be finite or countably infinite.
The expected value $E(X)$ is calculated by multiplying each possible value $x_i$ by its probability $p_i$ and summing these products over all possible values of $X$.
Formula for Expected Value:
$$E(X) = x_1 p_1 + x_2 p_2 + x_3 p_3 + \dots$$
... (1)
Using summation notation, this can be written concisely as:
$$E(X) = \sum x_i P(X=x_i) = \sum x_i p_i$$
... (2)
The summation is performed over all possible values $x_i$ in the range of the random variable $X$. If the number of possible values is infinite, this is an infinite series, which must converge for the expected value to exist.
Interpretation
- $E(X)$ is the theoretical mean of the probability distribution of $X$. It's analogous to the mean of a dataset, but calculated from the probabilities of the outcomes rather than observed frequencies.
- It represents the "center of mass" of the probability distribution. If you were to place weights (probabilities) on a number line at the locations of the possible values ($x_i$), the expected value would be the balance point.
- In the long run, if you were to repeat the random experiment a very large number of times and record the values of $X$, the average of these recorded values would approach $E(X)$ (Law of Large Numbers).
- The expected value does not necessarily have to be one of the possible values that the random variable $X$ can take. For example, the expected value of the outcome when rolling a fair six-sided die is 3.5, which is not a possible outcome of a single roll.
Example
Example 1. Find the expected value (mean) of the number of heads obtained when tossing two fair coins.
Answer:
Given: Random variable X = number of heads obtained when tossing two fair coins.
To Find: The expected value $E(X)$.
Solution:
First, we need the probability distribution of X. From Example 1 in the previous section (I2), the distribution is:
Value of X ($x_i$) (No. of Heads) |
Probability $P(X=x_i) = p_i$ |
---|---|
0 | 1/4 |
1 | 1/2 |
2 | 1/4 |
The possible values are $x_1=0, x_2=1, x_3=2$ with probabilities $p_1=1/4, p_2=1/2, p_3=1/4$.
Using the formula $E(X) = \sum x_i p_i$ (Formula 2):
$$E(X) = x_1 p_1 + x_2 p_2 + x_3 p_3$$
... (iii)
Substitute the values:
$$E(X) = (0 \times \frac{1}{4}) + (1 \times \frac{1}{2}) + (2 \times \frac{1}{4})$$
$$E(X) = 0 + \frac{1}{2} + \frac{2}{4}$$
$$E(X) = 0 + \frac{1}{2} + \frac{1}{2}$$
($2/4 = 1/2$)
$$E(X) = 1$$
... (iv)
The expected number of heads when tossing two fair coins is 1.
Example 2. A fair six-sided die is rolled. Let $X$ be the random variable representing the outcome (the number facing up). Find the expected value $E(X)$.
Answer:
Given: Random variable X = outcome of rolling a fair six-sided die.
To Find: The expected value $E(X)$.
Solution:
The possible values for $X$ are the numbers on the faces: $\{1, 2, 3, 4, 5, 6\}$.
Since the die is fair, each outcome is equally likely. The probability of each value $x_i$ is $P(X=x_i) = 1/6$.
The probability distribution is: $P(X=1)=1/6, P(X=2)=1/6, \dots, P(X=6)=1/6$.
Using the formula $E(X) = \sum x_i p_i$:
$$E(X) = (1 \times \frac{1}{6}) + (2 \times \frac{1}{6}) + (3 \times \frac{1}{6}) + (4 \times \frac{1}{6}) + (5 \times \frac{1}{6}) + (6 \times \frac{1}{6})$$
... (v)
We can factor out the common probability $1/6$:
$$E(X) = \frac{1}{6} (1 + 2 + 3 + 4 + 5 + 6)$$
... (vi)
Sum the numbers inside the bracket:
$$1 + 2 + 3 + 4 + 5 + 6 = 21$$
... (vii)
Substitute the sum back into (vi):
$$E(X) = \frac{1}{6} (21) = \frac{21}{6}$$
... (viii)
Simplify the fraction by dividing numerator and denominator by 3:
$$E(X) = \frac{\cancel{21}^{7}}{\cancel{6}_{2}} = \frac{7}{2}$$
... (ix)
$$E(X) = 3.5$$
... (x)
The expected value when rolling a fair die is 3.5.
This example shows that the expected value is a theoretical average and may not be one of the possible outcomes of a single trial.
Variance of a Random Variable
Definition and Concept
The **Variance** of a random variable $X$ is a measure of the dispersion or spread of its probability distribution around its expected value (mean), $\mu = E(X)$. It quantifies the expected value of the squared difference between the random variable's possible values and its mean.
The variance is denoted by $Var(X)$ or $\sigma^2$.
A high variance indicates that the possible values of $X$ are widely spread out from the mean, meaning the outcomes are highly variable. A low variance indicates that the possible values are clustered closely around the mean, meaning the outcomes are more consistent.
Like the variance of a dataset, the squaring of deviations ensures that positive and negative deviations do not cancel out and gives more weight to extreme deviations. The units of variance are the square of the units of the random variable.
Formula for a Discrete Random Variable
Let $X$ be a discrete random variable with possible values $x_1, x_2, \dots$ and corresponding probabilities $P(X=x_1) = p_1, P(X=x_2) = p_2, \dots$. Let $\mu = E(X)$ be the mean of the distribution.
The variance $Var(X)$ is defined as the expected value of the squared deviation $(X - \mu)^2$. Using the definition of expected value for a discrete random variable (Formula 2 from Section I1), we get:
Definition Formula for Variance:
$$Var(X) = E[(X - \mu)^2] = \sum (x_i - \mu)^2 P(X=x_i) = \sum (x_i - \mu)^2 p_i$$
... (1)
The summation is over all possible values $x_i$ of $X$. This formula is useful conceptually but can be cumbersome for calculations if the mean $\mu$ is not a simple number.
Computational Formula (Shortcut Formula):
An algebraically equivalent formula for variance that is often easier to use in practice is:
$$Var(X) = E(X^2) - [E(X)]^2$$
... (2)
This formula states that the variance is equal to the expected value of $X^2$ minus the square of the expected value of $X$.
To use this formula, you need to calculate $E(X)$ first (as shown in Section I1), and then calculate $E(X^2)$.
$E(X^2)$ is the expected value of the random variable $X^2$. The possible values of $X^2$ are $x_1^2, x_2^2, \dots$, and the probability of $X^2$ taking the value $x_i^2$ is the same as $X$ taking the value $x_i$, which is $p_i = P(X=x_i)$.
Using the expected value formula (Section I1), $E(X^2)$ is calculated as:
$$E(X^2) = \sum x_i^2 P(X=x_i) = \sum x_i^2 p_i$$
... (3)
So, the computational formula (Formula 2) becomes:
$$Var(X) = \sum x_i^2 p_i - \left(\sum x_i p_i\right)^2$$
... (4)
This formula is often computationally simpler because it avoids calculating deviations $(x_i - \mu)$ for each value $x_i$.
Derivation of the Computational Formula:
Start with the definition formula $Var(X) = E[(X - \mu)^2]$. Using the linearity of expectation (which states $E(aY + bZ) = aE(Y) + bE(Z)$ for constants $a, b$ and random variables $Y, Z$, and $E(c)=c$ for a constant $c$):
$$Var(X) = E[X^2 - 2\mu X + \mu^2]$$
... (v)
Apply linearity of expectation:
$$Var(X) = E(X^2) - E(2\mu X) + E(\mu^2)$$
Since $\mu$ is a constant ($E(X)$ is a fixed number) and 2 is a constant:
$$Var(X) = E(X^2) - 2\mu E(X) + \mu^2$$
Substitute $\mu = E(X)$:
$$Var(X) = E(X^2) - 2\mu(\mu) + \mu^2$$
$$Var(X) = E(X^2) - 2\mu^2 + \mu^2$$
$$Var(X) = E(X^2) - \mu^2$$
... (vi)
Substitute $\mu = E(X)$ back into equation (vi) gives the standard computational formula:
$$Var(X) = E(X^2) - [E(X)]^2$$
(Computational Formula Derived)
Standard Deviation of a Random Variable
The **Standard Deviation** of a discrete random variable $X$, denoted by $SD(X)$ or $\sigma$, is defined as the positive square root of the variance.
$$SD(X) = \sigma = \sqrt{Var(X)} = \sqrt{\sigma^2}$$
... (5)
The standard deviation is often preferred over variance because it is expressed in the **same units** as the random variable $X$ and its mean $E(X)$. This makes it more directly interpretable as a measure of the typical distance of the outcomes from the mean.
Example
Example 1. Find the variance and standard deviation of the number of heads obtained when tossing two fair coins.
Answer:
Given: Random variable X = number of heads when tossing two fair coins.
To Find: Variance $Var(X)$ and Standard Deviation $SD(X)$.
Solution:
From Example 1, Section I1, the probability distribution for $X$ is:
Value of X ($x_i$) | Probability $P(X=x_i) = p_i$ |
---|---|
0 | 1/4 |
1 | 1/2 |
2 | 1/4 |
We also calculated the mean $E(X) = \mu = 1$ in Example 1, Section I1.
Method 1: Using the Definition Formula $Var(X) = \sum (x_i - \mu)^2 p_i$.
Calculate the squared deviation $(x_i - \mu)^2 = (x_i - 1)^2$ and multiply by $p_i$ for each $x_i$:
- For $x_i=0$: $(0 - 1)^2 = (-1)^2 = 1$. Term $= (1) \times \frac{1}{4} = \frac{1}{4}$.
- For $x_i=1$: $(1 - 1)^2 = (0)^2 = 0$. Term $= (0) \times \frac{1}{2} = 0$.
- For $x_i=2$: $(2 - 1)^2 = (1)^2 = 1$. Term $= (1) \times \frac{1}{4} = \frac{1}{4}$.
Summing these terms:
$$Var(X) = \frac{1}{4} + 0 + \frac{1}{4} = \frac{2}{4} = \frac{1}{2}$$
... (vii)
Method 2: Using the Computational Formula $Var(X) = E(X^2) - [E(X)]^2$.
We know $E(X) = 1$, so $[E(X)]^2 = (1)^2 = 1$.
First, calculate $E(X^2) = \sum x_i^2 p_i$. Calculate $x_i^2$ and multiply by $p_i$ for each $x_i$:
- For $x_i=0$: $x_i^2 = 0^2 = 0$. Term $= (0) \times \frac{1}{4} = 0$.
- For $x_i=1$: $x_i^2 = 1^2 = 1$. Term $= (1) \times \frac{1}{2} = \frac{1}{2}$.
- For $x_i=2$: $x_i^2 = 2^2 = 4$. Term $= (4) \times \frac{1}{4} = \frac{4}{4} = 1$.
$$E(X^2) = 0 + \frac{1}{2} + 1 = \frac{1}{2} + \frac{2}{2} = \frac{3}{2}$$
... (viii)
Now apply the computational formula $Var(X) = E(X^2) - [E(X)]^2$:
$$Var(X) = \frac{3}{2} - (1)^2 = \frac{3}{2} - 1 = \frac{3}{2} - \frac{2}{2} = \frac{1}{2}$$
... (ix)
Both methods yield the same variance. $Var(X) = \frac{1}{2}$.
Calculate Standard Deviation:
The standard deviation is the positive square root of the variance.
$$SD(X) = \sigma = \sqrt{Var(X)} = \sqrt{\frac{1}{2}}$$.
... (x)
We can rationalize the denominator:
$$\sigma = \frac{\sqrt{1}}{\sqrt{2}} = \frac{1}{\sqrt{2}} = \frac{1 \times \sqrt{2}}{\sqrt{2} \times \sqrt{2}} = \frac{\sqrt{2}}{2}$$
... (xi)
The variance of the number of heads is $\frac{1}{2}$, and the standard deviation is $\frac{1}{\sqrt{2}}$ or $\frac{\sqrt{2}}{2}$.
Mean and Variance of Probability Distribution (Consolidated)
The **mean (Expected Value)** and **variance** are the two most fundamental parameters that characterize a probability distribution. The mean provides a measure of the central location of the distribution, while the variance quantifies its spread or variability. Together, they offer essential insights into the behavior of a random variable.
Summary Formulas for Discrete Random Variables
Let $X$ be a discrete random variable with possible values $x_1, x_2, x_3, \dots$ and corresponding probabilities $P(X=x_i) = p_i$. The set of possible values may be finite or countably infinite, provided the sums converge.
1. Mean (Expected Value)
The mean, denoted by $E(X)$ or $\mu$, is the theoretical average value of the random variable. It is calculated as the sum of each possible value multiplied by its probability:
$$\mu = E(X) = \sum x_i P(X=x_i) = \sum x_i p_i$$
... (1)
The summation is over all possible values of $X$.
2. Variance
The variance, denoted by $Var(X)$ or $\sigma^2$, measures the expected squared deviation of the random variable from its mean. It describes the spread of the distribution.
The definition formula for variance is:
$$\sigma^2 = Var(X) = E[(X - \mu)^2] = \sum (x_i - \mu)^2 P(X=x_i) = \sum (x_i - \mu)^2 p_i$$
... (2)
A more commonly used formula for calculation is the computational (shortcut) formula:
$$\sigma^2 = Var(X) = E(X^2) - [E(X)]^2 = E(X^2) - \mu^2$$
... (3)
Where $E(X^2)$ is the expected value of $X^2$, calculated as:
$$E(X^2) = \sum x_i^2 P(X=x_i) = \sum x_i^2 p_i$$
... (4)
3. Standard Deviation
The standard deviation, denoted by $SD(X)$ or $\sigma$, is the positive square root of the variance. It is the most common measure of spread and is in the same units as the random variable.
$$\sigma = SD(X) = \sqrt{Var(X)} = \sqrt{\sigma^2}$$
... (5)
Properties of Expectation and Variance
Expected value and variance have several important properties that are useful when working with random variables and transformations of random variables. Let $X$ and $Y$ be random variables and $a, b, c$ be constants.
Properties of Expectation:
- The expected value of a constant is the constant itself:
$$E(c) = c$$
... (6)
- The expected value of a constant times a random variable is the constant times the expected value:
$$E(aX) = a E(X)$$
... (7)
- The expected value of a linear transformation of a random variable is the linear transformation of the expected value:
$$E(aX + b) = a E(X) + b$$
... (8)
- The expected value of the sum of two random variables is the sum of their expected values (this holds regardless of whether X and Y are independent):
$$E(X + Y) = E(X) + E(Y)$$
... (9)
- The expected value of the product of two random variables is the product of their expected values **if $X$ and $Y$ are independent**:
$$E(XY) = E(X) E(Y) \quad \text{if X and Y are independent}$$
... (10)
(Note: This property does NOT hold for dependent variables in general).
Properties of Variance:
- The variance of a constant is 0 (a constant has no variability):
$$Var(c) = 0$$
... (11)
- The variance of a constant times a random variable is the square of the constant times the variance of the random variable:
$$Var(aX) = a^2 Var(X)$$
... (12)
(This is because the standard deviation $SD(aX) = |a|SD(X)$, and variance is the square of standard deviation).
- Adding a constant to a random variable does not change its variance (shifting the distribution does not change its spread):
$$Var(X + b) = Var(X)$$
... (13)
- Combining properties (12) and (13) for a linear transformation:
$$Var(aX + b) = a^2 Var(X)$$
... (14)
- The variance of the sum of two **independent** random variables is the sum of their variances:
$$Var(X + Y) = Var(X) + Var(Y) \quad \text{if X and Y are independent}$$
... (15)
- The variance of the difference of two **independent** random variables is also the sum of their variances:
$$Var(X - Y) = Var(X) + Var(Y) \quad \text{if X and Y are independent}$$
... (16)
(Proof: $Var(X - Y) = Var(X + (-1)Y) = Var(X) + (-1)^2 Var(Y) = Var(X) + 1 \cdot Var(Y) = Var(X) + Var(Y)$). Note that this property also requires independence.
These properties are fundamental for working with linear combinations of random variables and are widely used in deriving properties of probability distributions and in statistical inference.